Overview

Dataset statistics

Number of variables34
Number of observations1100
Missing cells2316
Missing cells (%)6.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory292.3 KiB
Average record size in memory272.1 B

Variable types

Numeric8
Categorical23
Boolean3

Warnings

C10 has constant value "False" Constant
C15 has constant value "0.0" Constant
C17 has constant value "1.0" Constant
C30 has constant value "True" Constant
C1 is highly correlated with C32High correlation
C4 is highly correlated with C9 and 3 other fieldsHigh correlation
C9 is highly correlated with C4 and 2 other fieldsHigh correlation
C16 is highly correlated with C32High correlation
C25 is highly correlated with C4 and 3 other fieldsHigh correlation
C27 is highly correlated with C32High correlation
C29 is highly correlated with C32High correlation
C31 is highly correlated with C4 and 2 other fieldsHigh correlation
C32 is highly correlated with C1 and 5 other fieldsHigh correlation
C1 is highly correlated with C32High correlation
C4 is highly correlated with C9 and 3 other fieldsHigh correlation
C9 is highly correlated with C4 and 2 other fieldsHigh correlation
C25 is highly correlated with C4 and 3 other fieldsHigh correlation
C27 is highly correlated with C32High correlation
C29 is highly correlated with C32High correlation
C31 is highly correlated with C4 and 2 other fieldsHigh correlation
C32 is highly correlated with C1 and 4 other fieldsHigh correlation
ID is highly correlated with Class and 4 other fieldsHigh correlation
Class is highly correlated with ID and 4 other fieldsHigh correlation
C1 is highly correlated with C15 and 3 other fieldsHigh correlation
C4 is highly correlated with C15 and 4 other fieldsHigh correlation
C9 is highly correlated with C15 and 4 other fieldsHigh correlation
C15 is highly correlated with ID and 6 other fieldsHigh correlation
C16 is highly correlated with C17 and 2 other fieldsHigh correlation
C17 is highly correlated with ID and 8 other fieldsHigh correlation
C19 is highly correlated with C23 and 1 other fieldsHigh correlation
C20 is highly correlated with C29 and 1 other fieldsHigh correlation
C23 is highly correlated with ID and 4 other fieldsHigh correlation
C25 is highly correlated with C4 and 2 other fieldsHigh correlation
C27 is highly correlated with C9 and 2 other fieldsHigh correlation
C29 is highly correlated with ID and 12 other fieldsHigh correlation
C31 is highly correlated with C9High correlation
C32 is highly correlated with C1 and 5 other fieldsHigh correlation
C7 is highly correlated with Class and 2 other fieldsHigh correlation
C20 is highly correlated with C11 and 1 other fieldsHigh correlation
C14 is highly correlated with C11 and 2 other fieldsHigh correlation
C11 is highly correlated with C20 and 10 other fieldsHigh correlation
Class is highly correlated with C7 and 1 other fieldsHigh correlation
C12 is highly correlated with C7 and 2 other fieldsHigh correlation
C27 is highly correlated with C11 and 1 other fieldsHigh correlation
C25 is highly correlated with C11 and 4 other fieldsHigh correlation
C19 is highly correlated with C11 and 1 other fieldsHigh correlation
C22 is highly correlated with C14 and 1 other fieldsHigh correlation
C13 is highly correlated with C32High correlation
C31 is highly correlated with C25 and 3 other fieldsHigh correlation
C9 is highly correlated with C25 and 3 other fieldsHigh correlation
C24 is highly correlated with C32 and 1 other fieldsHigh correlation
C26 is highly correlated with C32 and 1 other fieldsHigh correlation
C28 is highly correlated with C11 and 1 other fieldsHigh correlation
C16 is highly correlated with C32High correlation
C5 is highly correlated with C32High correlation
C32 is highly correlated with C7 and 24 other fieldsHigh correlation
C1 is highly correlated with C11 and 1 other fieldsHigh correlation
C3 is highly correlated with C11 and 1 other fieldsHigh correlation
C4 is highly correlated with C11 and 4 other fieldsHigh correlation
C8 is highly correlated with C11 and 2 other fieldsHigh correlation
C29 is highly correlated with C32High correlation
C6 is highly correlated with C32High correlation
ID is highly correlated with C11 and 1 other fieldsHigh correlation
C23 is highly correlated with C32High correlation
C21 is highly correlated with C26 and 1 other fieldsHigh correlation
Class has 100 (9.1%) missing values Missing
C11 has 1095 (99.5%) missing values Missing
C32 has 1095 (99.5%) missing values Missing
ID is uniformly distributed Uniform
C32 is uniformly distributed Uniform
ID has unique values Unique

Reproduction

Analysis started2021-09-16 03:34:25.302246
Analysis finished2021-09-16 03:34:45.664343
Duration20.36 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

ID
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
UNIFORM
UNIQUE

Distinct1100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean550.5
Minimum1
Maximum1100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.7 KiB
2021-09-16T11:34:45.872165image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile55.95
Q1275.75
median550.5
Q3825.25
95-th percentile1045.05
Maximum1100
Range1099
Interquartile range (IQR)549.5

Descriptive statistics

Standard deviation317.6869528
Coefficient of variation (CV)0.577088016
Kurtosis-1.2
Mean550.5
Median Absolute Deviation (MAD)275
Skewness0
Sum605550
Variance100925
MonotonicityStrictly increasing
2021-09-16T11:34:46.087755image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11
 
0.1%
7321
 
0.1%
7381
 
0.1%
7371
 
0.1%
7361
 
0.1%
7351
 
0.1%
7341
 
0.1%
7331
 
0.1%
7311
 
0.1%
7571
 
0.1%
Other values (1090)1090
99.1%
ValueCountFrequency (%)
11
0.1%
21
0.1%
31
0.1%
41
0.1%
51
0.1%
61
0.1%
71
0.1%
81
0.1%
91
0.1%
101
0.1%
ValueCountFrequency (%)
11001
0.1%
10991
0.1%
10981
0.1%
10971
0.1%
10961
0.1%
10951
0.1%
10941
0.1%
10931
0.1%
10921
0.1%
10911
0.1%

Class
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct2
Distinct (%)0.2%
Missing100
Missing (%)9.1%
Memory size8.7 KiB
0.0
723 
1.0
277 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters3000
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row0.0
3rd row1.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0723
65.7%
1.0277
 
25.2%
(Missing)100
 
9.1%

Length

2021-09-16T11:34:46.598905image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T11:34:46.753108image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0723
72.3%
1.0277
 
27.7%

Most occurring characters

ValueCountFrequency (%)
01723
57.4%
.1000
33.3%
1277
 
9.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2000
66.7%
Other Punctuation1000
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01723
86.2%
1277
 
13.9%
Other Punctuation
ValueCountFrequency (%)
.1000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01723
57.4%
.1000
33.3%
1277
 
9.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII3000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01723
57.4%
.1000
33.3%
1277
 
9.2%

C1
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct55
Distinct (%)5.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean34.96272727
Minimum18
Maximum75
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.7 KiB
2021-09-16T11:34:46.936589image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum18
5-th percentile22
Q126
median32
Q341
95-th percentile59
Maximum75
Range57
Interquartile range (IQR)15

Descriptive statistics

Standard deviation11.34541117
Coefficient of variation (CV)0.3245001766
Kurtosis0.5665631668
Mean34.96272727
Median Absolute Deviation (MAD)7
Skewness1.007872888
Sum38459
Variance128.7183547
MonotonicityNot monotonic
2021-09-16T11:34:47.149047image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2660
 
5.5%
2553
 
4.8%
2751
 
4.6%
2246
 
4.2%
2945
 
4.1%
2344
 
4.0%
3543
 
3.9%
2843
 
3.9%
3242
 
3.8%
2442
 
3.8%
Other values (45)631
57.4%
ValueCountFrequency (%)
182
 
0.2%
195
 
0.5%
2017
 
1.5%
2129
2.6%
2246
4.2%
2344
4.0%
2442
3.8%
2553
4.8%
2660
5.5%
2751
4.6%
ValueCountFrequency (%)
751
 
0.1%
743
0.3%
732
 
0.2%
701
 
0.1%
681
 
0.1%
673
0.3%
664
0.4%
657
0.6%
644
0.4%
637
0.6%

C2
Boolean

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.2 KiB
True
1058 
False
 
42
ValueCountFrequency (%)
True1058
96.2%
False42
 
3.8%
2021-09-16T11:34:47.317923image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

C3
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)0.5%
Missing7
Missing (%)0.6%
Memory size8.7 KiB
V3
370 
V5
278 
V2
193 
V4
183 
V1
69 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters2186
Distinct characters6
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowV5
2nd rowV3
3rd rowV5
4th rowV3
5th rowV3

Common Values

ValueCountFrequency (%)
V3370
33.6%
V5278
25.3%
V2193
17.5%
V4183
16.6%
V169
 
6.3%
(Missing)7
 
0.6%

Length

2021-09-16T11:34:47.699510image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T11:34:47.857256image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
v3370
33.9%
v5278
25.4%
v2193
17.7%
v4183
16.7%
v169
 
6.3%

Most occurring characters

ValueCountFrequency (%)
V1093
50.0%
3370
 
16.9%
5278
 
12.7%
2193
 
8.8%
4183
 
8.4%
169
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1093
50.0%
Decimal Number1093
50.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3370
33.9%
5278
25.4%
2193
17.7%
4183
16.7%
169
 
6.3%
Uppercase Letter
ValueCountFrequency (%)
V1093
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1093
50.0%
Common1093
50.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3370
33.9%
5278
25.4%
2193
17.7%
4183
16.7%
169
 
6.3%
Latin
ValueCountFrequency (%)
V1093
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2186
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
V1093
50.0%
3370
 
16.9%
5278
 
12.7%
2193
 
8.8%
4183
 
8.4%
169
 
3.2%

C4
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct45
Distinct (%)4.1%
Missing7
Missing (%)0.6%
Infinite0
Infinite (%)0.0%
Mean20.34766697
Minimum3
Maximum72
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.7 KiB
2021-09-16T11:34:48.047337image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile6
Q111
median18
Q324
95-th percentile47
Maximum72
Range69
Interquartile range (IQR)13

Descriptive statistics

Standard deviation12.04896476
Coefficient of variation (CV)0.592154608
Kurtosis0.8481559339
Mean20.34766697
Median Absolute Deviation (MAD)7
Skewness1.079782464
Sum22240
Variance145.1775518
MonotonicityNot monotonic
2021-09-16T11:34:48.271395image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=45)
ValueCountFrequency (%)
11109
 
9.9%
24102
 
9.3%
2399
 
9.0%
1295
 
8.6%
1765
 
5.9%
3656
 
5.1%
1855
 
5.0%
947
 
4.3%
1444
 
4.0%
642
 
3.8%
Other values (35)379
34.5%
ValueCountFrequency (%)
34
 
0.4%
43
 
0.3%
541
 
3.7%
642
 
3.8%
75
 
0.5%
831
 
2.8%
947
4.3%
1021
 
1.9%
11109
9.9%
1295
8.6%
ValueCountFrequency (%)
721
 
0.1%
609
 
0.8%
594
 
0.4%
541
 
0.1%
531
 
0.1%
4822
2.0%
4730
2.7%
461
 
0.1%
454
 
0.4%
442
 
0.2%

C5
Categorical

HIGH CORRELATION

Distinct10
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
V3
304 
V7
261 
V2
202 
V1
113 
V9
104 
Other values (5)
116 

Length

Max length3
Median length2
Mean length2.014545455
Min length2

Characters and Unicode

Total characters2216
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowV2
2nd rowV7
3rd rowV2
4th rowV7
5th rowV1

Common Values

ValueCountFrequency (%)
V3304
27.6%
V7261
23.7%
V2202
18.4%
V1113
 
10.3%
V9104
 
9.5%
V654
 
4.9%
V524
 
2.2%
V1016
 
1.5%
V413
 
1.2%
V89
 
0.8%

Length

2021-09-16T11:34:48.673657image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T11:34:48.843863image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
v3304
27.6%
v7261
23.7%
v2202
18.4%
v1113
 
10.3%
v9104
 
9.5%
v654
 
4.9%
v524
 
2.2%
v1016
 
1.5%
v413
 
1.2%
v89
 
0.8%

Most occurring characters

ValueCountFrequency (%)
V1100
49.6%
3304
 
13.7%
7261
 
11.8%
2202
 
9.1%
1129
 
5.8%
9104
 
4.7%
654
 
2.4%
524
 
1.1%
016
 
0.7%
413
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1116
50.4%
Uppercase Letter1100
49.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3304
27.2%
7261
23.4%
2202
18.1%
1129
11.6%
9104
 
9.3%
654
 
4.8%
524
 
2.2%
016
 
1.4%
413
 
1.2%
89
 
0.8%
Uppercase Letter
ValueCountFrequency (%)
V1100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1116
50.4%
Latin1100
49.6%

Most frequent character per script

Common
ValueCountFrequency (%)
3304
27.2%
7261
23.4%
2202
18.1%
1129
11.6%
9104
 
9.3%
654
 
4.8%
524
 
2.2%
016
 
1.4%
413
 
1.2%
89
 
0.8%
Latin
ValueCountFrequency (%)
V1100
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2216
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
V1100
49.6%
3304
 
13.7%
7261
 
11.8%
2202
 
9.1%
1129
 
5.8%
9104
 
4.7%
654
 
2.4%
524
 
1.1%
016
 
0.7%
413
 
0.6%

C6
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
V1
670 
V5
195 
V2
116 
V3
69 
V4
 
50

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters2200
Distinct characters6
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowV1
2nd rowV1
3rd rowV1
4th rowV1
5th rowV1

Common Values

ValueCountFrequency (%)
V1670
60.9%
V5195
 
17.7%
V2116
 
10.5%
V369
 
6.3%
V450
 
4.5%

Length

2021-09-16T11:34:49.237001image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T11:34:49.396654image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
v1670
60.9%
v5195
 
17.7%
v2116
 
10.5%
v369
 
6.3%
v450
 
4.5%

Most occurring characters

ValueCountFrequency (%)
V1100
50.0%
1670
30.5%
5195
 
8.9%
2116
 
5.3%
369
 
3.1%
450
 
2.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1100
50.0%
Decimal Number1100
50.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1670
60.9%
5195
 
17.7%
2116
 
10.5%
369
 
6.3%
450
 
4.5%
Uppercase Letter
ValueCountFrequency (%)
V1100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1100
50.0%
Common1100
50.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1670
60.9%
5195
 
17.7%
2116
 
10.5%
369
 
6.3%
450
 
4.5%
Latin
ValueCountFrequency (%)
V1100
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
V1100
50.0%
1670
30.5%
5195
 
8.9%
2116
 
5.3%
369
 
3.1%
450
 
2.3%

C7
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
V4
437 
V1
297 
V2
295 
V3
71 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters2200
Distinct characters5
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowV1
2nd rowV1
3rd rowV2
4th rowV1
5th rowV4

Common Values

ValueCountFrequency (%)
V4437
39.7%
V1297
27.0%
V2295
26.8%
V371
 
6.5%

Length

2021-09-16T11:34:49.799106image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T11:34:49.954175image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
v4437
39.7%
v1297
27.0%
v2295
26.8%
v371
 
6.5%

Most occurring characters

ValueCountFrequency (%)
V1100
50.0%
4437
 
19.9%
1297
 
13.5%
2295
 
13.4%
371
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1100
50.0%
Decimal Number1100
50.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
4437
39.7%
1297
27.0%
2295
26.8%
371
 
6.5%
Uppercase Letter
ValueCountFrequency (%)
V1100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1100
50.0%
Common1100
50.0%

Most frequent character per script

Common
ValueCountFrequency (%)
4437
39.7%
1297
27.0%
2295
26.8%
371
 
6.5%
Latin
ValueCountFrequency (%)
V1100
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
V1100
50.0%
4437
 
19.9%
1297
 
13.5%
2295
 
13.4%
371
 
3.2%

C8
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
V4
366 
V1
310 
V2
258 
V3
166 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters2200
Distinct characters5
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowV3
2nd rowV1
3rd rowV3
4th rowV4
5th rowV1

Common Values

ValueCountFrequency (%)
V4366
33.3%
V1310
28.2%
V2258
23.5%
V3166
15.1%

Length

2021-09-16T11:34:50.348269image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T11:34:50.503650image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
v4366
33.3%
v1310
28.2%
v2258
23.5%
v3166
15.1%

Most occurring characters

ValueCountFrequency (%)
V1100
50.0%
4366
 
16.6%
1310
 
14.1%
2258
 
11.7%
3166
 
7.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1100
50.0%
Decimal Number1100
50.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
4366
33.3%
1310
28.2%
2258
23.5%
3166
15.1%
Uppercase Letter
ValueCountFrequency (%)
V1100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1100
50.0%
Common1100
50.0%

Most frequent character per script

Common
ValueCountFrequency (%)
4366
33.3%
1310
28.2%
2258
23.5%
3166
15.1%
Latin
ValueCountFrequency (%)
V1100
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
V1100
50.0%
4366
 
16.6%
1310
 
14.1%
2258
 
11.7%
3166
 
7.5%

C9
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct916
Distinct (%)83.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3265.750909
Minimum249
Maximum18424
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.7 KiB
2021-09-16T11:34:50.695101image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum249
5-th percentile705.75
Q11366
median2301.5
Q33967.25
95-th percentile9270.35
Maximum18424
Range18175
Interquartile range (IQR)2601.25

Descriptive statistics

Standard deviation2833.05211
Coefficient of variation (CV)0.8675040409
Kurtosis4.286635288
Mean3265.750909
Median Absolute Deviation (MAD)1077.5
Skewness1.959181796
Sum3592326
Variance8026184.26
MonotonicityNot monotonic
2021-09-16T11:34:50.920440image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
15334
 
0.4%
12164
 
0.4%
12864
 
0.4%
23833
 
0.3%
10493
 
0.3%
22463
 
0.3%
14083
 
0.3%
13933
 
0.3%
40413
 
0.3%
4333
 
0.3%
Other values (906)1067
97.0%
ValueCountFrequency (%)
2491
0.1%
2761
0.1%
3382
0.2%
3431
0.1%
3621
0.1%
3671
0.1%
3851
0.1%
3922
0.2%
4081
0.1%
4252
0.2%
ValueCountFrequency (%)
184241
0.1%
159441
0.1%
158572
0.2%
156711
0.1%
156521
0.1%
148951
0.1%
147821
0.1%
145541
0.1%
144211
0.1%
143171
0.1%

C10
Boolean

CONSTANT
REJECTED

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.2 KiB
False
1100 
ValueCountFrequency (%)
False1100
100.0%
2021-09-16T11:34:51.092723image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

C11
Categorical

HIGH CORRELATION
MISSING

Distinct3
Distinct (%)60.0%
Missing1095
Missing (%)99.5%
Memory size8.7 KiB
V2
V1
V3

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters10
Distinct characters4
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)20.0%

Sample

1st rowV2
2nd rowV1
3rd rowV2
4th rowV1
5th rowV3

Common Values

ValueCountFrequency (%)
V22
 
0.2%
V12
 
0.2%
V31
 
0.1%
(Missing)1095
99.5%

Length

2021-09-16T11:34:51.456266image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T11:34:51.623599image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
v22
40.0%
v12
40.0%
v31
20.0%

Most occurring characters

ValueCountFrequency (%)
V5
50.0%
22
 
20.0%
12
 
20.0%
31
 
10.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter5
50.0%
Decimal Number5
50.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
22
40.0%
12
40.0%
31
20.0%
Uppercase Letter
ValueCountFrequency (%)
V5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin5
50.0%
Common5
50.0%

Most frequent character per script

Common
ValueCountFrequency (%)
22
40.0%
12
40.0%
31
20.0%
Latin
ValueCountFrequency (%)
V5
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII10
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
V5
50.0%
22
 
20.0%
12
 
20.0%
31
 
10.0%

C12
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
V4
437 
V1
297 
V2
295 
V3
71 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters2200
Distinct characters5
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowV1
2nd rowV1
3rd rowV2
4th rowV1
5th rowV4

Common Values

ValueCountFrequency (%)
V4437
39.7%
V1297
27.0%
V2295
26.8%
V371
 
6.5%

Length

2021-09-16T11:34:52.008732image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T11:34:52.168818image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
v4437
39.7%
v1297
27.0%
v2295
26.8%
v371
 
6.5%

Most occurring characters

ValueCountFrequency (%)
V1100
50.0%
4437
 
19.9%
1297
 
13.5%
2295
 
13.4%
371
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1100
50.0%
Decimal Number1100
50.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
4437
39.7%
1297
27.0%
2295
26.8%
371
 
6.5%
Uppercase Letter
ValueCountFrequency (%)
V1100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1100
50.0%
Common1100
50.0%

Most frequent character per script

Common
ValueCountFrequency (%)
4437
39.7%
1297
27.0%
2295
26.8%
371
 
6.5%
Latin
ValueCountFrequency (%)
V1100
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
V1100
50.0%
4437
 
19.9%
1297
 
13.5%
2295
 
13.4%
371
 
3.2%

C13
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing6
Missing (%)0.5%
Memory size8.7 KiB
V3
896 
V1
146 
V2
 
52

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters2188
Distinct characters4
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowV3
2nd rowV3
3rd rowV3
4th rowV3
5th rowV3

Common Values

ValueCountFrequency (%)
V3896
81.5%
V1146
 
13.3%
V252
 
4.7%
(Missing)6
 
0.5%

Length

2021-09-16T11:34:52.534346image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T11:34:52.688189image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
v3896
81.9%
v1146
 
13.3%
v252
 
4.8%

Most occurring characters

ValueCountFrequency (%)
V1094
50.0%
3896
41.0%
1146
 
6.7%
252
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1094
50.0%
Decimal Number1094
50.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3896
81.9%
1146
 
13.3%
252
 
4.8%
Uppercase Letter
ValueCountFrequency (%)
V1094
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1094
50.0%
Common1094
50.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3896
81.9%
1146
 
13.3%
252
 
4.8%
Latin
ValueCountFrequency (%)
V1094
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2188
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
V1094
50.0%
3896
41.0%
1146
 
6.7%
252
 
2.4%

C14
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
V3
697 
V2
219 
V4
161 
V1
 
23

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters2200
Distinct characters5
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowV4
2nd rowV3
3rd rowV4
4th rowV3
5th rowV3

Common Values

ValueCountFrequency (%)
V3697
63.4%
V2219
 
19.9%
V4161
 
14.6%
V123
 
2.1%

Length

2021-09-16T11:34:53.222557image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T11:34:53.378005image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
v3697
63.4%
v2219
 
19.9%
v4161
 
14.6%
v123
 
2.1%

Most occurring characters

ValueCountFrequency (%)
V1100
50.0%
3697
31.7%
2219
 
10.0%
4161
 
7.3%
123
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1100
50.0%
Decimal Number1100
50.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3697
63.4%
2219
 
19.9%
4161
 
14.6%
123
 
2.1%
Uppercase Letter
ValueCountFrequency (%)
V1100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1100
50.0%
Common1100
50.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3697
63.4%
2219
 
19.9%
4161
 
14.6%
123
 
2.1%
Latin
ValueCountFrequency (%)
V1100
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
V1100
50.0%
3697
31.7%
2219
 
10.0%
4161
 
7.3%
123
 
1.0%

C15
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
0.0
1100 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters3300
Distinct characters2
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.01100
100.0%

Length

2021-09-16T11:34:53.735933image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T11:34:53.888074image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
0.01100
100.0%

Most occurring characters

ValueCountFrequency (%)
02200
66.7%
.1100
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2200
66.7%
Other Punctuation1100
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02200
100.0%
Other Punctuation
ValueCountFrequency (%)
.1100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3300
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
02200
66.7%
.1100
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII3300
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
02200
66.7%
.1100
33.3%

C16
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct996
Distinct (%)90.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40530.60818
Minimum1446
Maximum220716
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.7 KiB
2021-09-16T11:34:54.073047image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1446
5-th percentile7894.25
Q119447.75
median33598
Q356142
95-th percentile96379.55
Maximum220716
Range219270
Interquartile range (IQR)36694.25

Descriptive statistics

Standard deviation28221.72522
Coefficient of variation (CV)0.6963064826
Kurtosis2.766538491
Mean40530.60818
Median Absolute Deviation (MAD)16866.5
Skewness1.343162899
Sum44583669
Variance796465774.4
MonotonicityNot monotonic
2021-09-16T11:34:54.287982image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
215403
 
0.3%
309642
 
0.2%
257542
 
0.2%
235072
 
0.2%
660992
 
0.2%
434852
 
0.2%
303752
 
0.2%
106332
 
0.2%
263292
 
0.2%
122232
 
0.2%
Other values (986)1079
98.1%
ValueCountFrequency (%)
14461
0.1%
15601
0.1%
19001
0.1%
20621
0.1%
22961
0.1%
26681
0.1%
34571
0.1%
35461
0.1%
40011
0.1%
40301
0.1%
ValueCountFrequency (%)
2207161
0.1%
1872041
0.1%
1574352
0.2%
1551951
0.1%
1464391
0.1%
1453411
0.1%
1311071
0.1%
1271131
0.1%
1262331
0.1%
1252181
0.1%

C17
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
1.0
1100 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters3300
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.01100
100.0%

Length

2021-09-16T11:34:54.667922image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T11:34:54.819985image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
1.01100
100.0%

Most occurring characters

ValueCountFrequency (%)
11100
33.3%
.1100
33.3%
01100
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2200
66.7%
Other Punctuation1100
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
11100
50.0%
01100
50.0%
Other Punctuation
ValueCountFrequency (%)
.1100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3300
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
11100
33.3%
.1100
33.3%
01100
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII3300
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
11100
33.3%
.1100
33.3%
01100
33.3%

C18
Categorical

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
V1
997 
V3
 
57
V2
 
46

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters2200
Distinct characters4
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowV1
2nd rowV1
3rd rowV1
4th rowV1
5th rowV1

Common Values

ValueCountFrequency (%)
V1997
90.6%
V357
 
5.2%
V246
 
4.2%

Length

2021-09-16T11:34:55.182580image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T11:34:55.345392image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
v1997
90.6%
v357
 
5.2%
v246
 
4.2%

Most occurring characters

ValueCountFrequency (%)
V1100
50.0%
1997
45.3%
357
 
2.6%
246
 
2.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1100
50.0%
Decimal Number1100
50.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1997
90.6%
357
 
5.2%
246
 
4.2%
Uppercase Letter
ValueCountFrequency (%)
V1100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1100
50.0%
Common1100
50.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1997
90.6%
357
 
5.2%
246
 
4.2%
Latin
ValueCountFrequency (%)
V1100
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
V1100
50.0%
1997
45.3%
357
 
2.6%
246
 
2.1%

C19
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct856
Distinct (%)77.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5001.148182
Minimum2272
Maximum8633
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.7 KiB
2021-09-16T11:34:55.547545image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum2272
5-th percentile3428.65
Q14326.5
median4969
Q35677
95-th percentile6682
Maximum8633
Range6361
Interquartile range (IQR)1350.5

Descriptive statistics

Standard deviation1001.006037
Coefficient of variation (CV)0.2001552444
Kurtosis-0.1480727151
Mean5001.148182
Median Absolute Deviation (MAD)666.5
Skewness0.1759786429
Sum5501263
Variance1002013.085
MonotonicityNot monotonic
2021-09-16T11:34:55.828985image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
44074
 
0.4%
55144
 
0.4%
40304
 
0.4%
56024
 
0.4%
43994
 
0.4%
49264
 
0.4%
53254
 
0.4%
50263
 
0.3%
45933
 
0.3%
56773
 
0.3%
Other values (846)1063
96.6%
ValueCountFrequency (%)
22721
0.1%
22851
0.1%
25361
0.1%
25551
0.1%
25871
0.1%
26091
0.1%
26141
0.1%
26341
0.1%
26441
0.1%
26751
0.1%
ValueCountFrequency (%)
86331
0.1%
80661
0.1%
78051
0.1%
76151
0.1%
76092
0.2%
76051
0.1%
75951
0.1%
74531
0.1%
74371
0.1%
74291
0.1%

C20
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
4
454 
2
333 
3
168 
1
145 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1100
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4
2nd row2
3rd row3
4th row4
5th row4

Common Values

ValueCountFrequency (%)
4454
41.3%
2333
30.3%
3168
 
15.3%
1145
 
13.2%

Length

2021-09-16T11:34:56.380411image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T11:34:56.561261image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
4454
41.3%
2333
30.3%
3168
 
15.3%
1145
 
13.2%

Most occurring characters

ValueCountFrequency (%)
4454
41.3%
2333
30.3%
3168
 
15.3%
1145
 
13.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1100
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
4454
41.3%
2333
30.3%
3168
 
15.3%
1145
 
13.2%

Most occurring scripts

ValueCountFrequency (%)
Common1100
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
4454
41.3%
2333
30.3%
3168
 
15.3%
1145
 
13.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII1100
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4454
41.3%
2333
30.3%
3168
 
15.3%
1145
 
13.2%

C21
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
V2
590 
V4
313 
V3
102 
V1
 
51
V5
 
44

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters2200
Distinct characters6
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowV2
2nd rowV4
3rd rowV3
4th rowV2
5th rowV3

Common Values

ValueCountFrequency (%)
V2590
53.6%
V4313
28.5%
V3102
 
9.3%
V151
 
4.6%
V544
 
4.0%

Length

2021-09-16T11:34:57.086818image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T11:34:57.254992image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
v2590
53.6%
v4313
28.5%
v3102
 
9.3%
v151
 
4.6%
v544
 
4.0%

Most occurring characters

ValueCountFrequency (%)
V1100
50.0%
2590
26.8%
4313
 
14.2%
3102
 
4.6%
151
 
2.3%
544
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1100
50.0%
Decimal Number1100
50.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2590
53.6%
4313
28.5%
3102
 
9.3%
151
 
4.6%
544
 
4.0%
Uppercase Letter
ValueCountFrequency (%)
V1100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1100
50.0%
Common1100
50.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2590
53.6%
4313
28.5%
3102
 
9.3%
151
 
4.6%
544
 
4.0%
Latin
ValueCountFrequency (%)
V1100
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
V1100
50.0%
2590
26.8%
4313
 
14.2%
3102
 
4.6%
151
 
2.3%
544
 
2.0%

C22
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
V1
652 
V2
448 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters2200
Distinct characters3
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowV2
2nd rowV1
3rd rowV2
4th rowV1
5th rowV1

Common Values

ValueCountFrequency (%)
V1652
59.3%
V2448
40.7%

Length

2021-09-16T11:34:57.721298image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T11:34:57.904254image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
v1652
59.3%
v2448
40.7%

Most occurring characters

ValueCountFrequency (%)
V1100
50.0%
1652
29.6%
2448
20.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1100
50.0%
Decimal Number1100
50.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1652
59.3%
2448
40.7%
Uppercase Letter
ValueCountFrequency (%)
V1100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1100
50.0%
Common1100
50.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1652
59.3%
2448
40.7%
Latin
ValueCountFrequency (%)
V1100
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
V1100
50.0%
1652
29.6%
2448
20.4%

C23
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
1
698 
2
368 
3
 
28
4
 
6

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1100
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1698
63.5%
2368
33.5%
328
 
2.5%
46
 
0.5%

Length

2021-09-16T11:34:58.354551image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T11:34:58.580281image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
1698
63.5%
2368
33.5%
328
 
2.5%
46
 
0.5%

Most occurring characters

ValueCountFrequency (%)
1698
63.5%
2368
33.5%
328
 
2.5%
46
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1100
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1698
63.5%
2368
33.5%
328
 
2.5%
46
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
Common1100
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1698
63.5%
2368
33.5%
328
 
2.5%
46
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII1100
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1698
63.5%
2368
33.5%
328
 
2.5%
46
 
0.5%

C24
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
V2
779 
V1
202 
V3
119 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters2200
Distinct characters4
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowV3
2nd rowV1
3rd rowV2
4th rowV1
5th rowV1

Common Values

ValueCountFrequency (%)
V2779
70.8%
V1202
 
18.4%
V3119
 
10.8%

Length

2021-09-16T11:34:59.004777image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T11:34:59.198697image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
v2779
70.8%
v1202
 
18.4%
v3119
 
10.8%

Most occurring characters

ValueCountFrequency (%)
V1100
50.0%
2779
35.4%
1202
 
9.2%
3119
 
5.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1100
50.0%
Decimal Number1100
50.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2779
70.8%
1202
 
18.4%
3119
 
10.8%
Uppercase Letter
ValueCountFrequency (%)
V1100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1100
50.0%
Common1100
50.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2779
70.8%
1202
 
18.4%
3119
 
10.8%
Latin
ValueCountFrequency (%)
V1100
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
V1100
50.0%
2779
35.4%
1202
 
9.2%
3119
 
5.4%

C25
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct45
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20.30818182
Minimum3
Maximum72
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.7 KiB
2021-09-16T11:34:59.423135image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile6
Q111
median18
Q324
95-th percentile47
Maximum72
Range69
Interquartile range (IQR)13

Descriptive statistics

Standard deviation12.03794903
Coefficient of variation (CV)0.5927635049
Kurtosis0.8589707858
Mean20.30818182
Median Absolute Deviation (MAD)7
Skewness1.080043858
Sum22339
Variance144.9122169
MonotonicityNot monotonic
2021-09-16T11:34:59.709152image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=45)
ValueCountFrequency (%)
11109
 
9.9%
24102
 
9.3%
23101
 
9.2%
1295
 
8.6%
1765
 
5.9%
3656
 
5.1%
1856
 
5.1%
947
 
4.3%
1444
 
4.0%
544
 
4.0%
Other values (35)381
34.6%
ValueCountFrequency (%)
34
 
0.4%
43
 
0.3%
544
4.0%
642
 
3.8%
75
 
0.5%
831
 
2.8%
947
4.3%
1021
 
1.9%
11109
9.9%
1295
8.6%
ValueCountFrequency (%)
721
 
0.1%
609
 
0.8%
594
 
0.4%
541
 
0.1%
531
 
0.1%
4822
2.0%
4730
2.7%
461
 
0.1%
454
 
0.4%
442
 
0.2%

C26
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
V2
590 
V4
313 
V3
102 
V1
 
51
V5
 
44

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters2200
Distinct characters6
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowV2
2nd rowV4
3rd rowV3
4th rowV2
5th rowV3

Common Values

ValueCountFrequency (%)
V2590
53.6%
V4313
28.5%
V3102
 
9.3%
V151
 
4.6%
V544
 
4.0%

Length

2021-09-16T11:35:00.202891image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T11:35:00.387957image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
v2590
53.6%
v4313
28.5%
v3102
 
9.3%
v151
 
4.6%
v544
 
4.0%

Most occurring characters

ValueCountFrequency (%)
V1100
50.0%
2590
26.8%
4313
 
14.2%
3102
 
4.6%
151
 
2.3%
544
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1100
50.0%
Decimal Number1100
50.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2590
53.6%
4313
28.5%
3102
 
9.3%
151
 
4.6%
544
 
4.0%
Uppercase Letter
ValueCountFrequency (%)
V1100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1100
50.0%
Common1100
50.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2590
53.6%
4313
28.5%
3102
 
9.3%
151
 
4.6%
544
 
4.0%
Latin
ValueCountFrequency (%)
V1100
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
V1100
50.0%
2590
26.8%
4313
 
14.2%
3102
 
4.6%
151
 
2.3%
544
 
2.0%

C27
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
4
524 
2
256 
3
174 
1
146 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1100
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4
2nd row2
3rd row4
4th row2
5th row4

Common Values

ValueCountFrequency (%)
4524
47.6%
2256
23.3%
3174
 
15.8%
1146
 
13.3%

Length

2021-09-16T11:35:00.835232image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T11:35:01.028372image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
4524
47.6%
2256
23.3%
3174
 
15.8%
1146
 
13.3%

Most occurring characters

ValueCountFrequency (%)
4524
47.6%
2256
23.3%
3174
 
15.8%
1146
 
13.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1100
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
4524
47.6%
2256
23.3%
3174
 
15.8%
1146
 
13.3%

Most occurring scripts

ValueCountFrequency (%)
Common1100
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
4524
47.6%
2256
23.3%
3174
 
15.8%
1146
 
13.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII1100
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4524
47.6%
2256
23.3%
3174
 
15.8%
1146
 
13.3%

C28
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
V3
595 
V2
346 
V4
102 
V1
 
57

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters2200
Distinct characters5
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowV3
2nd rowV3
3rd rowV1
4th rowV2
5th rowV4

Common Values

ValueCountFrequency (%)
V3595
54.1%
V2346
31.5%
V4102
 
9.3%
V157
 
5.2%

Length

2021-09-16T11:35:01.461517image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T11:35:01.653057image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
v3595
54.1%
v2346
31.5%
v4102
 
9.3%
v157
 
5.2%

Most occurring characters

ValueCountFrequency (%)
V1100
50.0%
3595
27.0%
2346
 
15.7%
4102
 
4.6%
157
 
2.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1100
50.0%
Decimal Number1100
50.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3595
54.1%
2346
31.5%
4102
 
9.3%
157
 
5.2%
Uppercase Letter
ValueCountFrequency (%)
V1100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1100
50.0%
Common1100
50.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3595
54.1%
2346
31.5%
4102
 
9.3%
157
 
5.2%
Latin
ValueCountFrequency (%)
V1100
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
V1100
50.0%
3595
27.0%
2346
 
15.7%
4102
 
4.6%
157
 
2.6%

C29
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing6
Missing (%)0.5%
Memory size8.7 KiB
1.0
931 
2.0
163 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters3282
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row2.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0931
84.6%
2.0163
 
14.8%
(Missing)6
 
0.5%

Length

2021-09-16T11:35:02.079874image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T11:35:02.262065image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
1.0931
85.1%
2.0163
 
14.9%

Most occurring characters

ValueCountFrequency (%)
.1094
33.3%
01094
33.3%
1931
28.4%
2163
 
5.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2188
66.7%
Other Punctuation1094
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01094
50.0%
1931
42.6%
2163
 
7.4%
Other Punctuation
ValueCountFrequency (%)
.1094
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3282
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
.1094
33.3%
01094
33.3%
1931
28.4%
2163
 
5.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII3282
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.1094
33.3%
01094
33.3%
1931
28.4%
2163
 
5.0%

C30
Boolean

CONSTANT
REJECTED

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.2 KiB
True
1100 
ValueCountFrequency (%)
True1100
100.0%
2021-09-16T11:35:02.739154image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

C31
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct916
Distinct (%)83.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3265.750909
Minimum249
Maximum18424
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.7 KiB
2021-09-16T11:35:02.962914image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum249
5-th percentile705.75
Q11366
median2301.5
Q33967.25
95-th percentile9270.35
Maximum18424
Range18175
Interquartile range (IQR)2601.25

Descriptive statistics

Standard deviation2833.05211
Coefficient of variation (CV)0.8675040409
Kurtosis4.286635288
Mean3265.750909
Median Absolute Deviation (MAD)1077.5
Skewness1.959181796
Sum3592326
Variance8026184.26
MonotonicityNot monotonic
2021-09-16T11:35:03.302801image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
15334
 
0.4%
12164
 
0.4%
12864
 
0.4%
23833
 
0.3%
10493
 
0.3%
22463
 
0.3%
14083
 
0.3%
13933
 
0.3%
40413
 
0.3%
4333
 
0.3%
Other values (906)1067
97.0%
ValueCountFrequency (%)
2491
0.1%
2761
0.1%
3382
0.2%
3431
0.1%
3621
0.1%
3671
0.1%
3851
0.1%
3922
0.2%
4081
0.1%
4252
0.2%
ValueCountFrequency (%)
184241
0.1%
159441
0.1%
158572
0.2%
156711
0.1%
156521
0.1%
148951
0.1%
147821
0.1%
145541
0.1%
144211
0.1%
143171
0.1%

C32
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING
UNIFORM

Distinct5
Distinct (%)100.0%
Missing1095
Missing (%)99.5%
Memory size8.7 KiB
5.0
2.0
1.0
4.0
3.0

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters15
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)100.0%

Sample

1st row5.0
2nd row2.0
3rd row1.0
4th row4.0
5th row3.0

Common Values

ValueCountFrequency (%)
5.01
 
0.1%
2.01
 
0.1%
1.01
 
0.1%
4.01
 
0.1%
3.01
 
0.1%
(Missing)1095
99.5%

Length

2021-09-16T11:35:04.071047image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T11:35:04.288238image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
5.01
20.0%
2.01
20.0%
1.01
20.0%
4.01
20.0%
3.01
20.0%

Most occurring characters

ValueCountFrequency (%)
.5
33.3%
05
33.3%
51
 
6.7%
21
 
6.7%
11
 
6.7%
41
 
6.7%
31
 
6.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number10
66.7%
Other Punctuation5
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
05
50.0%
51
 
10.0%
21
 
10.0%
11
 
10.0%
41
 
10.0%
31
 
10.0%
Other Punctuation
ValueCountFrequency (%)
.5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common15
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
.5
33.3%
05
33.3%
51
 
6.7%
21
 
6.7%
11
 
6.7%
41
 
6.7%
31
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII15
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.5
33.3%
05
33.3%
51
 
6.7%
21
 
6.7%
11
 
6.7%
41
 
6.7%
31
 
6.7%

Interactions

2021-09-16T11:34:31.541128image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:31.732718image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:31.941425image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:32.136437image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:32.345395image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:32.576249image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:32.758051image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:32.949107image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:33.140200image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:33.318906image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:33.502471image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:33.687031image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:33.871652image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:34.073521image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:34.265450image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:34.451561image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:34.640532image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:34.833032image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:35.024158image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:35.225532image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:35.501002image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:35.720694image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:35.914872image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:36.111419image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:36.308916image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:36.494405image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:36.692585image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:36.886605image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:37.081717image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:37.279758image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:37.471961image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:37.665575image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:37.861503image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:38.050033image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:38.232627image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:38.416918image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:38.610218image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:38.790005image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:38.971192image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:39.166972image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:39.399595image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:39.582456image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:39.766037image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:39.959239image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:40.152174image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:40.336309image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:40.616525image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:40.813302image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:41.004441image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:41.200066image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:41.480282image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:41.681163image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:41.884639image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:42.086858image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:42.277670image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:42.473142image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:42.663090image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:42.850882image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:43.041140image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:43.237512image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:43.430002image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:43.626895image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:43.829248image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-09-16T11:34:44.022888image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Correlations

2021-09-16T11:35:04.557094image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-09-16T11:35:04.857106image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-09-16T11:35:05.207058image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-09-16T11:35:05.534046image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-09-16T11:34:44.411110image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-09-16T11:34:44.966603image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-09-16T11:34:45.315277image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-09-16T11:34:45.558624image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

IDClassC1C2C3C4C5C6C7C8C9C10C11C12C13C14C15C16C17C18C19C20C21C22C23C24C25C26C27C28C29C30C31C32
011.053yesV511.0V2V1V1V37865FNaNV1V3V40.0309641.0V138244V2V21V311V24V31.0T7865NaN
120.035yesV311.0V7V1V1V13904FNaNV1V3V30.01574351.0V151602V4V12V111V42V32.0T3904NaN
231.040yesV518.0V2V1V2V34296FNaNV2V3V40.0219991.0V137203V3V21V218V34V11.0T4296NaN
340.028yesV314.0V7V1V1V41402FNaNV1V3V30.0563531.0V162454V2V11V114V22V21.0T1402NaN
450.040yesV311.0V1V1V4V11503FNaNV4V3V30.061601.0V154964V3V11V111V34V41.0T1503NaN
560.025yesV211.0V3V1V2V1624FNaNV2V1V20.0132821.0V340301V2V11V211V24V41.0T624NaN
670.061yesV36.0V4V3V4V11337FNaNV4V3V30.0583761.0V143044V2V11V26V21V11.0T1337NaN
781.024yesV212.0V2V1V2V22968FNaNV2V3V30.0694441.0V159683V5V12V112V54V21.0T2968NaN
890.025yesV311.0V2V1V4V1762FNaNV4V3V30.0256531.0V171891V2V21V211V24V21.0T762NaN
9100.044yesV536.0V7V1V4V410874FNaNV4V3V30.0137841.0V143992V3V22V236V32V32.0T10874NaN

Last rows

IDClassC1C2C3C4C5C6C7C8C9C10C11C12C13C14C15C16C17C18C19C20C21C22C23C24C25C26C27C28C29C30C31C32
10901091NaN43yesV217.0V2V1V4V21533FNaNV4V3V20.0447541.0V244871V2V11V217V24V42.0T1533NaN
10911092NaN25yesV318.0V3V1V1V11344FNaNV1V1V30.0303971.0V173703V2V11V218V24V41.0T1344NaN
10921093NaN29yesV521.0V7V1V1V41601FNaNV1V3V30.0253311.0V158413V4V22V221V44V41.0T1601NaN
10931094NaN46yesV524.0V7V1V4V42538FNaNV4V3V20.0374021.0V133454V3V12V224V34V32.0T2538NaN
10941095NaN22yesV213.0V3V1V2V22100FNaNV2V3V20.0148121.0V361164V2V11V213V22V21.0T2100NaN
10951096NaN31yesV348.0V3V1V1V46758FNaNV1V3V30.0237861.0V142002V2V21V248V23V21.0T6758NaN
10961097NaN39yesV522.0V3V3V4V42674FNaNV4V3V30.0434461.0V148444V2V11V222V23V31.0T2674NaN
10971098NaN23yesV317.0V2V1V1V12123FNaNV1V3V30.0187601.0V140444V4V12V117V44V21.0T2123NaN
10981099NaN25noV25.0V3V1V2V1589FNaNV2V3V20.0350261.0V150543V2V11V25V23V41.0T589NaN
10991100NaN34yesV37.0V3V1V2V12576FNaNV2V3V30.0664661.0V350342V2V11V27V22V31.0T2576NaN